home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
USA Bestseller
/
USA BESTSELLER Vol 1-95 (Hepp-Computer)(1995).iso
/
e190
/
vbench.txt
< prev
next >
Wrap
Text File
|
1995-02-20
|
10KB
|
265 lines
--------------------------------- VBENCH -----------------------------------
Requirements:
------------
* 80286 or higher processor
* DOS 2.0 or higher
* VGA compatible graphics card
To compile you need:
* C++ compiler -> BORLAND C++ 3.1
* Assembler -> TASM 3.1+
* Linker
* Mark Betz's HTimer class
Where to get: Compuserve-> the Gamer's forum, Game Design library
Files Included:
--------------
In addition to this file, the following files should also be in the
ZIP file:
* VBENCH.EXE - The executable video benchmark program
* VBENCH.CPP - The main C++ source module, takes care of
calling and calculating time for the benchmarks.
* BENCH.ASM - The benchmark Assembly language source module,
includes all benchmark code.
* VIDEO.ASM - The video mode setup and buffer management code.
* VBENCH.MAK - The make file used to compile program
* VBENCH.PRJ - The Borland C++ 3.1 project file used to compile program
Description:
-----------
The VBENCH program was developed for the prime purpose of comparing
different blit (block-transfer) techniques in both mode 13h and tweaked
mode (planar mode 13h with 4 pages). Hopefully, the benchmarks will
serve the purpose of helping graphics programmers choose the technique
that suits their application best, based on some of the timing results.
By no means should these results be used as the _sole_ reason for
choosing a technique, because there are many special cases that aren't
accounted for in the benchmarks.
Usage:
-----
To use the benchmark program, all that's required is that you type
in it's name at the command line, like so:
VBENCH
Press a key at the prompt, and the tests will then go underway...
depending on your system, and on the amount of tests being done, this
may take some time. Once they are finished, the program will exit and
display the benchmark results on your screen.
Benchmark Info:
---------------
- For Ram-to-Video, AND for Video-to-Ram benchmarks, a 64,016 byte
buffer was used as the Ram buffer, and therefore named ram_buffer. The
extra 16 bytes on the end are for special non-aligned accesses. For
Video-to-Video benchmarks, I copied from higher addresses to lower
addresses, using an incrementing index. (ex: 1st copy moves from byte
4 to byte 2, 2nd copy moved from byte 5 to byte 3,etc). The reason for
me doing this was because this seemed to be the only way to get the
average speed of moves. It may sound weird, but in my tests, at least,
on my system, I got much faster speeds if I copied _ahead_, by, say,
about 2 bytes, as compared to copying ahead by 32 bytes or copying
'backwards'. I'd be interested in hearing if the same situation occurs
to others out there.. all you need do is change the video transfer
functions in BENCH.ASM, and compare those results to the 'backwards'
moves.
- As of yet, all benchmarks are 64,000 byte moves, repeated 10
times. The functions are not _called_ 10 times, but the functions
are _performed_ 10 times. Perhaps in the future I will change this,
but it doesn't make a difference right now, since I am timing video
speed, not function speed.
No parameters were passed to the benchmark functions, and no
variables were accessed by the function. Each function was aligned
on a paragraph boundary. The program was compiled in COMPACT model,
so all CALLs were NEAR calls.
- One thing to note.. all current functions are aligned optimally
for the type of move being done,assuming the type is of word or dword.
The reason for this is that aligned moves work much faster than unaligned
moves, and you should try to avoid those types of moves as best you can.
On my system, I have found that unaligned moves can be just as slow, or
even slower than BYTE moves.
- For mode 13h benchmarks, ram_buffer was treated as a 64,000 pixel
buffer setup just like the screen (linear bitmap).
- For Tweaked mode benchmarks, ram_buffer was treated as though it
was set up in a planar fashion, meaning every 16,000 bytes of the buffer
represented a different plane. This isn't a cheat, but an ideal setup
for Tweaked mode video transfers.
- I didn't do a interleaved write measure because I usually always
code my loops to use REP MOVS' instructions. The interleaved write
requires that you do a LOOP, which will always be slower than a REP
instruction, since REP doesn't need to load the instruction pointer
and whatnot. If you wanted to do a fair comparison of interleaved and
non-interleaved writes, you would want to make them _BOTH_ contain LOOP
instructions, avoiding the REP MOVS instructions where possible.
Specific Benchmark Info:
* Shared Benchmarks (benchmarks done in both mode 13h and Tweaked mode)
- Byte Blit
Write to the screen, using BYTE moves.
- Word Blit
Write to the screen, using WORD moves.
- Word Read
Read from the screen, using WORD moves.
* Mode 13h-specific Benchmarks
- Word Video Transfer
Video Transfer (moving data using the video card as the source
and the destination), using WORD moves.
* Tweaked mode-specific Benchmarks
- Hardware Video Transfer
Video Transfer (moving data using the video card as the source
and the destination), with the video card in Write Mode 1,
using BYTE moves. Write mode 1 gives a hardware-assisted
move which allows 32-bits to be moved with one MOVSB instruction.
32-bits equals 4 pixels in tweaked mode.
Adding Benchmarks:
-----------------
To add a benchmark is basically straightforward. The main module,
VBENCH.CPP includes a file called VBENCH.H.. this file serves the
purpose of defining the benchmarks to be done by the main program.
There are two class definitions in VBENCH.H.. one named SharedBenchData,
the other named BenchData. Each of these holds a description of the
benchmark in string form, and a pointer or pointers to the benchmark
function(s). The only thing these classes lack is the timing results
(which are stored elsewhere in the program) for each test, but the
reason for this is to make adding more benchmarks less work.
Now, to add a benchmark test to the list, all you need to do is:
1) Define the function prototype. There are 2 different 'slots'
for function prototypes, one for mode 13h function prototypes,
and one for tweaked mode function prototypes. While you don't
have to put the prototypes in these places, it does help in
readability and organization.
2) Depending on the type of benchmark you are peforming, you either
A) For mode-specific benchmarks, find the correct list, either
the Tweaked mode or Mode 13h list, and add another BenchData
object to the list. To add another BenchData object, you must
define it like such:
function_address,bench_description
The function_address is just the benchmark function name,
without the parentheses(). The bench_description is a
description of the benchmark being performed, in a string
form. Example:
New13Blit,"A new blit function"
B) For shared benchmarks (benchmarks that can be performed in
both mode 13h or Tweaked mode), find the shared_benchmark list,
and add another SharedBenchData object to the list. To add
another SharedBenchData object, you must define it like such:
m13_function_address,tw_function_address,bench_description
The m13_function address is just the mode 13h benchmark
function name, without the parentheses(), and the
tw_function_address is the Tweaked mode benchmark function
name. bench_description is a description of the benchmark
being performed. Example:
NewBlit13,NewBlitTw,"A new blit function"
Note: the bench_description strings are limited to 30 characters!
3) Include the benchmark functions file in the compiling project,
and then compile away!
Guidelines for creating benchmark functions:
1) Right now I only do a loop of 10 64,000 byte moves. This can be
performed using byte,word, or dword transfers..just be sure to
indicate which kind was being done in the benchmark description.
2) I align all benchmark functions on a paragraph boundary, to make
sure the timed function is at optimal speed. (though it might not
make _too_much_ difference in time)
3) All system ram access is done on the ram_buffer which is located
in the Uninitialized Far Data segment, and also paragraph aligned.
4) No parameters are passed, and no variables accessed from within
the functions.
5) All functions requiring OUT's or somesuch activity do these within
the loop. Even if the OUT is needed only once, it still should
be included in the loop, to make sure no cheats are performed.
The loop I speak of is the outside loop, not the inner transfer
loops.
You can use the current benchmark code as a reference if you like.
Notes:
-----
* This program can be used and distributed without any worry. It is
asked, though, that it not be sold for profit. The benchmark can
be modified, but with these restrictions:
1) Adding more benchmarks to the program is allowed, so long as
the current benchmarks remain in the program.
2) Modifying the current benchmarks is allowed ONLY if you
contact the author of that benchmark. Right now, there's only
one programmer (me), but I hope that others will also contribute
to this benchmark program.
3) Modifying the _way_ in which the benchmarks are timed is allowed
ONLY if you contact me first (Dan Corritore)
* Actually, I need to find some way of making sure version numbers
and additions to the benchmarks are handled correctly, so for now,
don't upload any additions/changes until speaking with me (Dan Corritore).
* Eventually I will rewrite this documentation once other benchmarks
are added, or perhaps if the benchmark program is changed in any way.
Any help or suggestions with documentation layout and stuff would be
greatly appreciated, as I'm not the best documentor.
* Please, if you see a problem with the program code, or any mistakes,
or perhaps think I'm going about doing the benchmarks totally wrong,
let me know!
Email address(es):
-----------------
Dan Corritore, author of VBENCH 1.0:
CompuServe address: 70243,1110